HP Confidential                                 Hewlett-Packard Company

Version: July 24, 1992

Netware/iX Performance Patch LAN Labs Benchmark Results Version:
July 24, 1992

Summary
This report summarizes the performance testing of the Netware/iX
Performance Patch using the LAN Labs Benchmarks from PC Magazine.
These results are correlated with the previous testing on earlier
versions of Netware/iX, Netware/9000, and Netware running on Intel
platforms.

Results
Overall there have been major improvements in Netware/iX's file sharing
performance which met the goals set for this product.  In the single
user case (called "no load" in the accompanying tables and graphs)
Netware/iX is able to deliver 81% of the performance of a Netware 3.11
server running on a Vectra 486/25. CPU utilization was at 5%.  The best
results however are seen under load conditions.  As benchmark clients
were added to Netware/iX running on a 917 it degraded more slowly than
any other platform.  At five benchmark client loads it was is the best
Netware server tested.  Specifically it was marginally better than the
486 and 386 servers, 28% better than the HP9000 720, and 265% better
than the HP9000 807 running Netware/9000.

Your Mileage May Vary
These results are duplicable on the test platform described later in
this document, however, the reader must be aware that a specific user
environments may see either better or worse results than what is
described here based on what the user is doing and the environment on
the system.  The interpretation section below should help in
understanding what conditions may lead to different results.
The improvements in Netware/iX will only be available on an NIO system.
The best performance will be seen on NOVA class SPUs.

Put All Your Eggs in One Basket, and Watch That Basket
The goal of the Netware/iX Performance Patch was to improve the file
server performance, specifically the read/write path, since the vast
majority of user work is done here.  The SPX IPC API, the printer
spooler, and all directory functions were improved some, but not to the
degree of the read/write path improvements.

Measured Systems
The following tables and graphs contain measurements from previous
tests as well as the results being reported here.  The entry "917 FP"
indicates Netware/iX with the "Fast Path" code turned on. This is the
Netware/iX Performance Patch software.  Previous tests using the "917
SP" and "967 SP" were done earlier this year and used the "Slow Path"
code prior to the performance patch software.  These results are
reported to show the improvement of the Fast Path code.  For
comparison, the benchmarks were run against a 386 and a 486 running
Netware 3.11, the current product from Novell, as well as against the
HP9000 807 (which uses the same processor board as the 917) and the 937
(which uses the Nova 2.3 processor board).  The number of loads was
from a single measurement system (the "no load" case), to the
measurement system and ten additional loads.

Throughput
Since throughput is usually the first thing we are asked about, here
are the numbers.  There are several things which can be seen here.
First, as mentioned above, in the single user case (call "no load") the
performance of Netware/iX in this benchmark is close to the 486, 386,
and HP9000 720, while it is better than the HP900 837, 817, and 807
(which uses the same processor board as the 917).  The better news is
that at five loads Netware/iX out performs the 720, and marginally
exceeds the 486. In the accompanying tables, the systems tested are
listed in order of best performance for the five load case.  "FP"
stands for fast path, or the Netware/iX Performance Patch, and "SP"
stands for slow path or the Netware/iX product prior to the performance
enhancements.
                        LAN     Labs    PC    Benchmark     Tests

                        Throughput                   (KBytes/sec)

        917 FP  486/25  386/25  720     837     817     807     917 SP
No Load 283     349     329     366     251     219     185     115
1       256     315     292     304     215     184     159     72
2       241     288     250     234     186     162     121     47
3       234     250     238     238     167     146     92      34
4       222     226     210     224     151     133     72      27
5       219     211     198     171     141     131     60      22
6       218                     165     135     113
10      204                     159     121
                    Italicized      Entries     are      Extrapolations



CPU Utilization

The observed CPU utilization is documented in the following table and
graph.  In the table the systems tested are listed in order of best
performance for the five load case.  The important item to note is that
the 917 has lower CPU utilization than the 807 which is its peer with
regards to processor board.  The only metric where the 837 out
performed the 917FP was in CPU utilization, which is to be expected
since the 837 uses the Nova 2.3 SPU as compared to the 1.0 SPU of the
917 and 807.

The very low CPU utilization of the 486 under load indicates that for
the platform being tested that there is a bottleneck other than the CPU
which is limiting server performance.



                        LAN       Labs       PC      Benchmark   Tests

                        CPU Utilization
        486/25  837     386/25  917 FP  817     720     807     917 SP
No Load 28%     20%     50%     5%      33%     43%     50%     80%
1       29%     38%     56%     29%     55%     66%     75%     100%
2       30%     50%     62%     55%     78%     82%     98%     100%
3       31%     58%     68%     63%     85%     88%     100%    100%
4       32%     70%     73%     77%     92%     90%     100%    100%
5       33%     72%     80%     86%     95%     97%     100%    100%
6               72%             88%     99%     99%
10              77%             100%            100%
                    Italicized      Entries     are      Extrapolations



Server Degradation with Offered Load
The observed client throughput degradation is documented in the
following table and graph.  The systems tested are listed in order of
best performance for the 5 load case.  This measures the rate at which
the server is slowing down due to increased load.  The important item
to note here is that the 917 FP is degrading slower than all other
platforms.  This would indicate that until the SPU reached near 100%
CPU utilization that the 917 FP would be able to gain and surpass the
performance of the 486 which was tested.  This is good news for the
scalability of Netware/iX on the high end Nova boxes where additional
CPU is available to the product beyond the 917's capacity.

                         LAN       Labs       PC     Benchmark    Tests

                    Throughput  Degradation  as  Compared  to  No  Load

        917 FP  486/25  386/25  817     837     720     807     917 SP
No Load 100%    100%    100%    100%    100%    100%    100%    100%
1       91%     90%     89%     84%     86%     83%     86%     63%
2       85%     82%     76%     74%     74%     64%     65%     41%
3       83%     72%     72%     67%     66%     65%     50%     30%
4       78%     65%     64%     61%     60%     61%     39%     24%
5       77%     61%     60%     60%     56%     47%     32%     20%
6       77%                     52%     54%     45%
10      72%                             48%     43%

Improvement
The following graph shows that our improvement over the slow path
product scales dramatically.  There are two reasons for this. First,
under the previous product, we consumed the 917 CPU with a single load,
whereas the fast path code has CPU to spare.  Further, the NIO LAN card
latency problem, which would prevent even an infinitely fast SPU from
matching the performance of the 486 is less apparent under load.
Whereas the 486, with its fast LAN card will degrade at each new load,
the bottleneck on MPE/iX is the LAN card and the CPU is able to keep up
with the load in the tests.

Interpretation
Description of Netware/iX Fast Path Code.  Briefly, the fast path code
implements a scheme similar to native Netware itself to gain
performance, specifically, large amounts of file data are cached in
physical memory eliminating the need to access the file system for most
reads and writes.  On MPE/iX this is implemented as an extension to the
LAN driver through a modification of the LAN Access (LA) interface.
File reads and writes occur to the cached file data in physical memory
rather than to the file on disc.  As a result of the fast path code
existing as an extension to the LAN driver, the access to the cached
file data occurs on the Interrupt Control Stack (ICS), outside of the
process environment of MPE, eliminating dispatcher calls and pre-
empting from MPE.  The net result of this is that access for reads and
writes is very fast.  In addition to the file access improvements,
Quest modified their code to eliminate some of the previously known
bottlenecks.  These improvements were primarily in directory access,
caching disc requests, and pre-allocating file opens.  No changes were
made to the SPX Interprocess Communication (IPC) or to the print
spooling and sharing features.

Description of test environment
The full environment for testing is described in Appendix A below.  In
brief, there is a single PC, in this case an IBM PS/2 Model 70, which
measures its individual throughput while other PCs add load to the
server.  The throughput of the measurement PC is the number reported
above. As more PCs add load to the server, the server attempts to meet
the increased load requirements, reducing the load provided to the
measurement PC.  This is seen as degraded throughput performance to the
measurement PC reported above.

Factors which may change the observed performance

Memory allocated per client connection.
One key factor to gaining the maximum performance is the amount of
memory dedicated to each client connection.  The amount of physical
memory allocated to each client is globally configurable (one setting
applies to all clients).  Reducing frozen size below working set size
will cause slower performance as the server goes into the process
environment to update the working set.  Two factors, this will slow the
server down as the OS overhead is now much larger, and second, the
other processes of the system with the same or higher priority will now
be able to contend with the server.  This may or may not be desirable
to the tuning of the server and the system, and weighing the overall
system goals.

Single user throughput will not meet the single user throughput of the
Intel server

The latency of the NIO LAN card is greater than the entire LAN and
processing time of the 486.  This means that in the single user case,
an NIO system will never meet the performance
Conditions under which better performance will be seen
Strictly read/write.

Since the code for Netware/iX fast path deals only with the read/write
path and files cached in memory, applications which open files and
leave them open for a long time will have better performance than
applications which open many small files and jump back and forth.
Windows is an example of a program which opens many small files.
Working Set of the Cached File Matches the Amount of Memory Allocated
by Netware/iX.

Since each client connection is allocated a block of physical memory
for file caching, better performance will be seen when the size of
these block of memory matches the working set of the file opened.  For
example if the file being accessed is 500KBytes, but only 50KBytes are
consistently accessed, then allocating the small amount will provide a
good match between the client requirements and server resources.
However, if the entire file is being accessed, such as in the case of
random access of a database file, then the larger amount would better
suit the client access.  One of the implications of a small memory
allocation to the clients is heavier use of the "slow path" code to
access the actual data file and bring it into memory.  Two possible
side effects would be increased CPU utilization and decreased
throughput.

Large reads (eg. loading data, program load)
Large data reads, such as during a program load, or reading a large
spreadsheet into a program, will provide better performance than many
small reads.

Large writes (eg. writing a data file)
Large data writes, such as writing a document out of a word processor,
will provide better performance than many small writes to several
files.

Conditions which will degrade performance results
Random access of file beyond the amount of memory frozen for the client
Where file access requires a "search" or where file access occurs
outside of the data cached in memory, the "slow path" code is invoked
resulting in slower performance and increased CPU utilization.
Directory operations (eg. new directory, new file, moving files,
copying directory structures)
All directory operations use the "slow path" code which uses actual
disc IO and is not cached. Heavy usage of directory calls, such as in
copying an entire directory structure, will result in decreased
performance.

API usage
No optimization of the SPX or NetBIOS APIs was done in this project.
The APIs will use the previous version of code and will result in
overall slower performance due to the increased load that they put on
the system.

Print Spooling
No optimization of printer spooling was done in this project. Heavy use
of the print spooler will result in overall slower performance due to
the increased load that they put on the system.

Extrapolations

Scalability of Higher Capacity SPUs.
Given that the throughput degradation on the 917 was directly related
to the CPU consumption, we anticipate that Netware/iX will be highly
scalable to higher capacity SPUs.  The effect which the user will see
however is not increased speed, but increased capacity. With the 967
SPU (the 2.3 Nova SPU), we would expect to see a commensurably lower
CPU utilization and less degradation in the benchmark environment.
This would result in increased capacity.  Again, the NIO LAN card
latency would prevent improved single user throughput from being seen
on the higher capacity SPUs.



Appendix A: Description of Test Platforms



Load Measuring Client
         IBM PS/2 Model 70
         8Mbytes RAM
         3COM 3C523 LAN Card

Vectra 486 Server, 25 Mhz1
         Netware 386 version 3.11
         12 Mbytes of RAM
         ISA ESDI disc controller2
         1 EISA LAN card3

HP 3000 917 Server (This server used the Nova 1.0 board, although the
chassis and memory configuration may have been closer to the standard
947 configuration.)
         Netware/XL A.01.01 based on Portable Netware 3.01b for Slow
           Path results
         Netware/XL A.01.08 002 based on Portable Netware 3.01b and
            MPE/iX 3.1 for Fast Path results
         40 Mbytes of RAM
         32 Kbytes I cache
         64 Kbytes D cache
         1 NIO LAN card

HP 9000 807 Server
         Netware/9000 based on Portable Netware 3.01b
         48 Mbytes of RAM

HP 9000 720 Server
         Netware/9000 based on Portable Netware 3.01b
         48 Mbytes of RAM

Load Producing Clients
         1st load: Compaq 386/20 or Vectra 386/20N
         2nd load: Compaq 386/25 or Vectra RS/25C
         3rd load: Vectra QS/20
         4th load: Vectra QS/20
         5th load: Vectra ES/12
         6th load: Vectra Classic
         loads  7  through  10 using whatever PCs we can  find
                including a Compaq, Vectra 486, and Vectra Classics

Appendix B: Description of Benchmark Tests

Quoting from the benchmark documentation:  "The PC Magazine LAN
Benchmark tests exercise and evaluate networks using tasks identical to
those performed by real applications.  Please note that the results you
get with these and all other performance tests are only comparable to
other tests run under identical conditions.  The CPU power of the PCs
[both servers and clients] running this software is an often-
overlooked factor that is critical to the results you receive.

When you run these tests, one network client PC runs the Evaluation
Module of this program while all other PCs run a selection from the
Load Module.  This architecture provides a measurement of what one user
would experience on a crowded network.

Remember that just two or three fast PCs running Load programs can use
up a large percentage of the available bandwidth in any 10-16 megabit
per second network."

From other sources, we've determined that one load client is roughly
equivalent to 10 to 20 "real" clients on the network.  The load clients
are running far faster than any actual user would be able to work.  The
best way to look at these numbers is to compare similar test conditions
across platforms.  However, before making direct comparisons with other
PC Magazine results you should remember that the mix of slow and fast
clients will affect the results seen at the server.  In order to gain
100% accuracy, the exact test setup of the article should be
duplicated.  For example, a slow client will make a slow server look
better since throughput is measured at the client and a slow client
would be unable to take advantage of the full potential of a fast
server.  It is our opinion that these results are sufficient for
internal use and comparisons between HP's products and numbers quoted
in PC Magazine.  However, more accurate testing would be required if we
were going to publish these results.

Bibliography:  Other Sources for NOS Performance Information
DEC VAX as AppleShare Server - MacUser (an article that appeared
sometime within the last six months).  The VAX was far slower than the
Macintosh II used as a server.
Other Portable Netware Implementations - PC Week; August 19, 1991. The
article compared Portable Netware running on NCR, Prime, and Altos Unix
systems.  The general results were that the Compaq 486/33Mhz was
running about 5 times faster than the Unix systems.  Note that these
tests were not the PC Magazine tests which are described here.  LAN
Manager 2.0 versus Netware 386 3.1 - PC Magazine; December 11, 1990.
This article compared the latest versions (as of that writing) of
Netware and LAN Manager.  The results shows the Netware platform
(Compaq dual 386/33) under no load performing at about 281Kbytes/sec,
and LM 2.0 at 200Kbytes/sec.  Under 5 loads Netware performed at
270Kbytes/sec and LM 2.0 at 183Kbytes/sec.  The difference between
these numbers and the test results are in the servers used (Compaq dual
386 versus Vectra 486), version of Netware (3.1 versus 3.11 which did
improve performance), and the difference in clients.
Netware 386 as a Macintosh AppleShare Server - MacUser; November 1991.
Netware 386 running on a Compaq 486/33 was 6 times faster than a
Macintosh IIfx (the current high end Macintosh as of the writing the
article) in the same role.
Intel based Superservers - Data Communications; "Can Superservers Scale
Up to Enterprise Status?", July 1991. Comparing the then current list
of "superservers" and whether they can scale up to handling the data
load then being handled by mini- and mainframe computers.
1Novell recommends a 33Mhz 486 as a departmental LAN server.
2This is considered to be a relatively slow disk.  If further testing
is requested, we should move to a SCSI based system.  Novell recently
reported on a Vectra 486/33Mhz with 32bit SCSI controllers at the
Netware for Unix conference in January.
3This is considered the "fast" LAN card.
